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We studied the mutual information between a stimulus and a system consisting of stochastic, 
statistically independent elements that respond to a stimulus. Using statistical mechanical methods 
the properties of the Mutual Information (MI) in the limit of a large system size, N, are calculated. 
For continuous valued stimuli, the MI increases logarithmically with N and is related to the log 
of the Fisher Information of the system. For discrete stimuli the MI saturates exponentially with 
N. We find that the exponent of saturation of the MI is the Chernoff Distance between response 
probabilities that are induced by different stimuli. 
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Population coding serves as a common paradigm of in- 
formation processing in the brain. Its starting point is 
the fact that very often the response of a single neuron to 
a stimulus is noisy and is only weakly tuned to changes 
in the stimulus value. Hence, the information carried by 
a typical neuron is rather low. The brain may overcome 
this limitation by distributing the information across a 
large number of neurons which together carry accurate 
information about the stimulus B. This paradigm has 
inspired numerous studies of the statistical efficiency of 
population codes 0- 0- Of particular interest is the way 
in which the amount of information about the stimulus 
depends on both the size of the responding population 
and the response properties of the individual neurons. 

Many theoretical studies H^JiQ] have employed the 
well-known concept of Fisher Information (FI) |ll[] . The 
FI is related to the derivative of the population response 
probability with respect to the stimulus. For a statis- 
tically independent population, the FI is an extensive 
quantity and is relatively easy to calculate. However, it is 
restricted to the case of a continuously varying stimulus. 
An alternative measure, applicable for arbitrary stimu- 
lus spaces, is provided by Shannon Mutual Information 
(MI) jnj. Unfortunately, except for special cases, exact 
calculation of the MI for a large population is difficult 
even for independent populations . This is because MI is 
bounded from above by the stimulus entropy, and thus 
is not an extensive quantity. Recently, a relationship be- 
tween the MI of a continuous stimulus and the FI in a 
large population has been derived (6|J^]. However, little 
theoretical progress has been made on the properties of 
the MI in large systems with discrete stimuli || . 

In this paper we introduce statistical mechanical meth- 
ods to calculate analytically the behavior of the MI as 
the system size, N, of the population grows. For con- 
tinuous valued stimuli our theory yields a logarithmic 
dependence of the MI on the FI, in agreement with pre- 
vious results HQ. In the case of discrete stimuli, the MI 
saturates exponentially fast with N. We show that the 



exponential saturation rate is dominated by a contribu- 
tion from the two stimulus values that induce the clos- 
est response-probabilities in the population. The con- 
tribution of this pair of stimuli to the saturation rate 
equals the Chernoff Distance between the corresponding 
response-probabilities, of Large Deviation Theory H] . 

We consider a population of N stochastic units, which 
we call neurons, that respond simultaneously to a presen- 
tation of a stimulus. We denote their responses by a vec- 
tor r = {r\,r%, ...,r^} where T{ represents the stochastic 
response of the i-th neuron to a stimulus. The stimulus 
states are denoted by the scalar variable 9, which can 
be either discrete or continuous, with a prior probability 
(or density) p(6). We denote the probability (or the den- 
sity) of r given a stimulus 9 by P(r\8). In this paper we 
will focus on the case of statistically independent neurons 
(given a stimulus 9), namely, 
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An important issue is how to quantify the efficiency of 
the coding of 9 in the population responses. For proba- 
bilities that are differentiable functions of a continuously 
varying stimulus, a well-known measure of the efficiency 
of the population code is the FI. In our case it is 
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where (• • -) r \g denotes an average with respect to P(r.i\9), 
which is clearly an extensive quantity. Here we study an 
alternative measure of efficiency, the MI of the system, 
which is the average amount of information on 9 that is 
added by observing the response, r. It is useful to define 
a local MI, 1(6), 
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where P(r) = £ 9 P(r|0)p(0), and (• • •),„ 
erage w.r.t. P(r|0). The full MI, 7, is 
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where (• • ■}$ is an average over the stimulus distribution. 
A central quantity is the log-likelihood variable, 
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where Si(r,-, (f>, 9) - 
can be written as 
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where Pe-fS^} is the distribution of 5</, = S(r,</>, 0), calcu- 
lated with respect to P(r|6*). For large N this distribution 
is centered around the mean value of Ss 
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where Dkl(4>\\@) > is the relative entropy of P(r|</>) 
and P(r\9), also known as the Kullback-Leibler (KL) 
distance between the two distributions. The correlation 
matrix of the fluctuations SS^ = S^, + Dkl(<P\\0) is also 
of order TV, and is given by 
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We first discuss the MI of a large system with a con- 
tinuous varying stimuli. Here, all averages over stimulus 
space stand for integrals with a density p(9), and we as- 
sume that the probabilities vary smoothly with 9. In this 
case, the dominant contribution to Eq. (^|) comes from 
values of near their mean value and <p which is near 9, 
i.e., \<j>-9\< 1/y/N. This is because Sg = S(r,9, 9) = 
while for <f> that is far from 9, S^, ~ —Dkl(<I>\\Q), which is 
large and negative. For small magnitudes of 8(f) =<fi — 9, 
and Sip =tp — 9 we can write 

D KL (^\\9)^^J(9)8cp\ C(0,#?)« J{0)5cf>8^ (9) 

where J (9) is the FI, Eq. (3). The low rank form of the 
correlation matrix in Eq. (B) implies that the fluctuations 
6 can be described as zy J(9)8<p, where by central limit 
theorem z is a Gaussian variable with zero mean and unit 
variance. Substituting this in Eq. (^) yields, 
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where Dz = dz exp(— z 2 / 2) / y/2n . Evaluating the inte- 
grals in the limit of a large J, and substituting in Eq. (Q), 
yields 
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where Hg = —J d9p(9) log 2 p(0), in agreement with pre- 
vious results §§. 

We now turn to the more difficult case of discrete 
valued stimulus which takes the values {9i},_-, and M 
remains finite as JV — > oo. The term (f> = 9 in the 
average (expS^)^ in Eq. (||) is p(9), yielding a total 
contribution to 7 which is the stimulus entropy Hg = 
— ^Z e p(9) log 2 p(0). Naively, we would therefore expect 
that the main contribution to log 2 p(9) — 1(9) comes from 
the typical values of S^, namely, —Dkl(<I>\\0), for a state 
which is closest to 9. Such a contribution would be pro- 
portional to exp(— D k l{4>\\Q)) ■ However, we find that in 
fact, the dominant contribution comes from rare values 
of r such that S# = for one of the states <j> ((f) ^ 9). 
This is because, as we will show, although this regime 
has an exponentially small probability, its contribution 
to 7 is exponentially larger than that of the typical value 
of S<f, making it the dominant correction to 7 — Hg . 

To evaluate Eq. (||) in the discrete case we use an inte- 
gral representation of the distribution of = S(r,(f>, 9), 



Pe{S^}= f []^exp{-P e (y ,^)} 
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Fe = -J2 ln (expil^Y^fo, <j>, 9)) r{e + i £ Y^S*. (13) 

Note that there is no integration over variables with 
<j> = 6, since Sg = S(r,9, 9) = 0. The large N limit of 
Eq. (|TJ) and (jl^) is evaluated by the saddle point method. 
Solving the saddle point equations for {S<p}, and {Y^} we 
find that there are M — 1 saddle points, each of which is 
characterized by having one of 0(1) while the remain- 
ing are negative and of O(N). The auxiliary variable 
Y<p = —ia where a is a real number of order 1 while the 
remaining Ym are zero. At this saddle point, the value 

Of Fg, Eq. P) is 
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where the order parameter a is evaluated by maximizing 
Eq. p|), 
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We will denote the maximum value of D a by Dc(4>,0). 
Out of these M — 1 saddle points the dominant one is 
that with the smallest Dc(4>,0), yielding 



ln[- log 2 p(6) - 1(6)] 



Finally, the full MI is given by 
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where the In N and the constant A correction terms come 
from the Gaussian integration around the saddle point of 
Eqs. (|^) and (|l2|). The calculations of the constant A will 
be presented elsewhere Q. Finally, examining Eq. ( |l2] ) 
at the saddle-point, it can be seen that exp(— Dc{4>, 6)) 
equals the probability that S^/N w for a large N. This 
implies, as we have mentioned above, that the rate of sat- 
uration of the MI is dominated by the probability of the 
rare event that one of the log-likelihood ratios is close to 
zero. 

The quantity Dc(4>,^) derived in the above theory is 
identical to the Chernoff Distance (or Chernoff Informa- 
tion) between the two distributions P(r\<j)) and P(r\ip). 
More generally, the Chernoff Distance between two arbi- 
trary distributions P(x), and Q(x), is defined |ll[] as 



where 



Dc{Q,P) = max. D a (Q\\P) 



D a {Q\\P) = -\uY J Q{x) a P{x) 1 - 



(18) 



(19) 



D a (Q\\P) are proportional to the family of Renyi dis- 
tances. Q. D a (Q\\P) vanishes at a = and a = 1. It 
is positive (if Q =^ P) for < a < 1, with a maximum at 
a*, < a* < 1, and is negative outside this regime. It is 
related to the KL distance through its slope at a = 0, i.e., 
dD a (Q\\P)/da\ a=0 = D KL {Q\\P). The Chernoff Dis- 
tance chooses the value of a which maximizes D a (Q\\P). 
This value is not constant but depends on the pair of 
distributions Q and P. Note, that in the case of a family 
of distributions parameterized by a continuous parame- 
ter 9, the Chernoff Distance is related to the FI through 
AD C (9, 9 + 89)~ D KL (9, 9 + 89) ~ J(9)(S9) 2 /2. 

Although Dc{Q,P) is not symmetric with respect 
to Q and P for general a, it obeys the symme- 
try D a (Q\\P) = Di_ a (P\\Q), which implies that 
D a *(Q\\P) = D a ,(P\\Q). Thus, the Chernoff Distance 
of Q and P is a symmetric function of the two distri- 
butions. It is smaller than Dkl, and in addition, it is 
less sensitive than the KL distance to outlier states. In 
particular, a single state which has a nonzero probability 
P but zero probability Q, causes D KL (Q\\P) to diverge 
whereas Dc(Q,P) remains finite. In fact, it diverges 
only when the two distributions have zero overlap, i.e., 
the intersection of their support is empty. 



Equation ( |17| ) implies that the Chernoff distance con- 
trols the rate of saturation of the MI. In order to test our 
theory we have studied the case of a population of statis- 
tically independent, binary neurons, where = {0, 1}, 
and a stimulus which can take three values: #i,#2 and 
6*3 . Each neuron has a preferred stimulus, denoted by 9 l . 
The mean value of (i.e., the probability that = 1) is 



f i (6) = Z i (6)+T8 e ai 
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For each neuron, the 9i is chosen at random from 
{6q, #2, #3} , with equal probability, and Zi(9) are chosen 
at random uniformly from [a, b], with T > b — a. Thus, 
the parameter T measures the selectivity of the responses 
to the stimulus values. In this case, there is no statistical 
difference between the response probability of the popu- 
lation as a whole to the three stimuli, and changing the 
stimulus corresponds to permuting the mean responses 
of the individual neurons in the population. This is com- 
mon in biological situations where different stimuli (such 
as different angles, spatial positions or abstract objects) 
elicit activity profiles that are similar in shape but shifted 
in position across the network. 




FIG. 1. The Mutual Information between a population of 
N independent binary neurons and an M-state stimulus, with 
M = 3, as a function of TV. The response parameters T, a, 
and b are 0.75, 0.05, and 0.15, respectively. The dots are the 
exact numerical calculation. The line is the approximation, 
Eq. (17). We have included a constant ln(A/ — 1) to account 
for the M — 1 equal contributions to 1(8) because of the sym- 
metry of the stimuli. The Inset compares the two results on 
a log plot. 

Figure 1 shows a nice agreement between the results 
of exact numerical calculation of the MI of this model up 
to size N — 25, and the asymptotic result of Eqs. (|l4|)- 
(|l7|), evaluated for the above distributions. Note that 
because of the symmetry between the different stimuli 
in this example, Dc(4>,4 } )-, is the same for all pairs of 
stimuli, 9 7^ 6', so that they all contribute equally to 
Eq. @. 
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The importance of Dc{4>^) in characterizing the ef- 
ficiency of the population code is manifest not only in 
the MI of the system but also in the accuracy of the dis- 
crimination between stimuli on the basis of observation 
of the population responses. Plausible discriminatory of- 
ten base their discrimination between a pair of stimuli 
on the log-likelihood ratio of the corresponding distri- 
butions. In particular, the Maximum-Likelihood (ML) 
discriminator makes a deterministic decision between a 
pair of stimuli 9 and (/> upon observing r, according to 
whether the log-likelihood, S(r,cj),8), is larger or smaller 
than zero. As outlined below, our theory yields that the 
probability of confusion of an ML discrimination between 
a pair of stimuli, 9 and 0, is dominated by the probability 
that S(r,cj>, 9) is close to zero, and thus is determined by 
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FIG. 2. The probability of confusion in the case of an in- 
dependent population of Poissonian neurons, incurred by an 
ML discriminator whose task is to discriminate between three 
stimuli. The mean response of each neuron is given by Eq. 
(20), with T = 3, and Zi(6) sampled uniformly, between 
a = 17, and b = 20. Dots are the ML decision error cal- 
culated by averaging over 5000 samples of r. The line is the 
prediction of Eq. (22). 

To evaluate the discrimination error, we write the aver- 
age probability of confusion, Pc of an ML discrimination 
as P c = (Pc(0)) e where, 



Pc{9) = 1 - 




where Q(x) is the Heaviside step function, and PeiS^} 
is given by Eq. (|lj). Using the saddle point methods as 
described above, we find that in a large system the dom- 
inant contribution to the integrals over comes from 
the edge points where one is close to zero. Evaluating 
the saddle point equations for under this condition, 
yields 

lnP c = - min £><?(</>, - -InN + A' (22) 

An example is shown in Fig. 2, where we have computed 
the confusion error of an ML discriminator between two 



stimuli in the case of a population of neurons responding 
to three stimuli as described above, except that in this 
case, the neurons are Poissonian with means /,(#), of the 
form Eq. (pfj|). The numerical results obtained by simu- 
lating the ML discriminator are in very good agreement 
with the prediction of Eq. ( p2] ) . 

In conclusion, we have shown the relation between the 
MI of a large population coding for a stimulus and the 
distance between the response-probabilities that are in- 
duced by the different stimulus values. In the case of 
a continuous stimulus, the MI increases logarithmically 
with N, for large N, and is related to the FI which mea- 
sures the vanishing rate of the distance for infinitesimally 
small stimulus differences. In the case of discrete stim- 
ulus, the MI saturates exponentially at a rate which is 
given by the Chernoff Distance between the closest pair of 
population response probabilities. In addition, we have 
shown that Dc determines also the probability of dis- 
crimination error. This extends the classical Large De- 
viation Theory results jll| to cases where the elements 
of the population are not identical. Our finding that the 
Chernoff Distance controls both the MI and the error 
probability for a large population that code discrete stim- 
uli, suggests that Dc is a useful measure of the quality 
of neuronal population codes. We hope that these re- 
sults will provide tractable tools to study the nature of 
population codes in the brain using experimental data on 
neuronal representations of sensory, motor and cognitive 
events. 
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